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Abstract 

Background: The purpose of the project was to delineate a series of contiguous neighbourhood-based "Data 
Zones" within the Region of Peel (Ontario) for the purpose of health data analysis and dissemination. Zones were 
to be built on Census Tracts (N = 205) and obey a series of requirements defined by the Region of Peel. This paper 
explores a method that combines statistical analysis with ground-truthing, consultation, and the use of a decision 
tree. 

Data: Census Tract data for Peel were derived from the 2006 Canadian Census Master file. 

Methods: Following correlation analysis to reduce the data set, Principal Component Analysis was applied to the 
data set to reduce the complexity and derive an index. The Getis-Ord Gi*statistic was then applied to look for 
statistically significant clusters of like Census Tracts. A detailed decision tree for the amalgamation of remaining 
zones and ground-truthing with Peel staff verified the resulting zones. 

Results: A total of 15 Data Zones that are similar with respect to socioeconomic and sociodemographic attributes 
and that met criteria defined by Peel were derived for the region. 

Conclusion: The approach used in this analysis, which was bolstered by a series of checks and balances 
throughout the process, gives statistical validity to the defined zones and resulted in a robust series of Data Zones 
for use by Peel Public Health. We conclude by offering insight into alternative uses of the methodology, and 
limitations. 



Background 

Independent of individual characteristics, it is recog- 
nized that an individual's immediate environment pos- 
sesses both material and social characteristics that are 
linked to health status as well as health-seeking beha- 
viours [1-3]. That is, health reflects both individual char- 
acteristics, as well as the characteristics of the 
neighbourhood which constrains and enables individual 
health. For example, neighbourhoods may provide 
important information and support with regard to 
health practices and behaviours, but may also be asso- 
ciated with poor health in cases where crime is higher 
or the physical environment is poorer [4]. Concurrently, 
there is a common need for health status and related 
data to be represented at the 'neighbourhood' scale, 
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whether it is for the provision of social welfare pro- 
grams, planning, or health care delivery. 

Geographers have long been concerned with defining 
neighbourhoods and places, and examples of techniques 
to define neighbourhoods abound in the academic lit- 
erature [see, for example: 2, 3, 5-10]. Weden et al. [10], 
for example discuss the evolution and theoretical foun- 
dations, including links to public health issues, asso- 
ciated with neighbourhood classification, starting with 
the Chicago School. However, there are many 
approaches to defining zones, ranging from simple cases 
that are based on existing or historical neighbourhoods, 
school catchments zones, and communities, to more 
complex approaches including hierarchical clustering 
and scale-space approaches [see, for example, 11-13]. 
But even the so-called 'simple' cases can have fuzzy 
boundaries that are not agreed upon by residents and 
authorities alike, and new suburban communities may 
not self-identify as a cohesive neighbourhood, meaning 
that how areas are defined has been approached 
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differentially based on the application. Most use various 
measures, such as poverty or educational attainment, 
that are derived from statistical organizations (such as 
Statistics Canada or the US Census Bureau), and repre- 
sent a proxy for health outcomes. For example, the City 
of Toronto's 'Major and Minor' Health Planning Areas 
used the proportion of the population living in low 
income at the Census Tract level [2], In Scotland, the 
identification of data zones was based on the Townsend 
Deprivation Index [9]. The City of Ottawa, Canada, ana- 
lyzed physical and demographic characteristics of neigh- 
bourhoods through the so-called 'wombling technique' 
that analytically grouped areas based on statistical simi- 
larities, with the results providing an approximation of 
neighbourhoods [8,14]. They then went on to use a 
combination of ground-truthing, spatial analytical tech- 
niques, and GIS to define neighbourhoods. However, 
the wombling technique itself may be subject to valida- 
tion inconsistencies based on the starting point of the 
analysis. Similarly, the use of simple, additive structures 
or the reliance on one particular population attribute to 
identify similar areas has also been criticized. 

Despite attention and numerous papers on the topic, 
there is no one ideal (or recognized) way to define 
neighbourhoods and their spatial boundaries, and a lack 
of consensus remains as to the empirical definition of 
neighbourhoods. Often times, however, zones are con- 
structed to reflect or identify differences in health across 
space [i.e., 1-3, 6-8]. But health is defined by more than 
just personal health and access to health care services. 
For example, the Determinants of Health framework 
[15,16] - which represents a synthesis of public health 
and social science literature and includes issues such as 
lifestyle options (i.e., drinking, smoking, physical activ- 
ity), nutrition, housing, work, education, income, as well 
as mechanisms related to societal power, social identity, 
social status and control over life circumstances are 
influential in the distribution of health - suggests that 
these various place-based effects influence health at the 
neighbourhood scale [4,17]. Since they can be used to 
help contextualize and define neighbourhoods empiri- 
cally, as opposed to more intuitive or theoretical con- 
ceptualizations [6], these place-based effects have 
formed the core of multiple papers on neighbourhood 
definition. 

Multivariate techniques, geographic information sys- 
tems (GIS), and spatial analytical (SA) techniques 
further enable understanding of neighbourhoods and 
their geography. For instance, GIS enables the visualiza- 
tion of neighbourhoods, while spatial analysis and clus- 
ter detection techniques such as the Getis-Ord Gi* 
statistic [18] provide a statistically robust way to identify 
areas that share statistically similar characteristics by 
identifying clusters of census tracts with values higher in 



magnitude than might be expected by random chance. If 
such statistical techniques are coupled with expert opi- 
nion and a clear decision process on boundary place- 
ment, approaches that use a mix of techniques may 
provide better area-based definitions. Ultimately, these 
neighbourhoods can be used to further understand 
health (or other) differences across space, and the rela- 
tionship between place and health. 

The question at hand is how to appropriately define 
aggregate neighbourhoods ('Data Zones') in the Region 
of Peel, Ontario. The project was initiated by Peel Public 
Health, who contacted the research team in mid-2009. 
The overall purpose of the project was to delineate a 
series of contiguous Data Zones within the Region for 
the purpose of health data dissemination. The use of the 
term 'Data Zones', as opposed to neighbourhoods, was 
preferred, since neighbourhoods typically have some 
degree of social identification associated with them and 
are frequently geographically smaller than the areas that 
would ultimately be identified in this project. The 
desired outcome, as requested by Peel Public Health, 
was to accomplish the following three goals: 

• Develop a methodology for defining Data Zones 
within the Region of Peel while accounting for sociode- 
mographic and socioeconomic effects; 

• Use the Data Zones to describe selected health 
issues and outcomes across space; 

• Analyze and report findings, such as the differences 
in health outcomes between spatial areas. 

The resulting Data Zones are not intended to facilitate 
the delivery of services, but to identify relationships 
between inequalities in neighbourhoods and health dis- 
parities, with Peel Public Health using the zones as a 
communications vehicle; for reporting to people who 
have an interest in certain geographic areas; for planning 
purposes at the strategic level; and for following relevant 
trends over time. 

The research team was therefore charged with devel- 
oping a methodology to delineate internally homoge- 
nous Data Zones using geographic data with relevant 
software based on the 2006 Census, and using Census 
Tracts as the existing boundaries from which to build 
the zones. The purpose of this paper is therefore to 
illustrate a multivariate-structured technique [19] for the 
derivation of a series of Data Zones in the Region of 
Peel, Ontario. Following the selection of variables used 
to characterize and contextualize Census Tracts relative 
to health outcomes, GIS and spatial analysis techniques 
were used to map and construct Data Zones within the 
Region using the Getis-Ord Gr statistic [18]. The Gi* 
statistic identifies 'hot-spots' or statistically significant 
clusters of similar Census Tracts, providing a statistically 
robust definition of neighbourhoods. The delineation of 
Data Zones is further facilitated by a structured decision 
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tree approach, 'ground-truthing' with staff from Peel, 
and the overlay of existing neighbourhoods, road and 
other physical landforms to ensure appropriate repre- 
sentation and delineation of the zones. As such, the 
methodology to define zones is a heuristic approach, 
rather than an optimization method utilized in other 
studies, but one that provides a robust way to define 
zones. 

Data 

Lying to the west and northwest of the City of Toronto, 
the Region of Peel is part of the Greater Toronto Area 
(GTA) and the Toronto Census Metropolitan Area 
(CMA) (Figure 1). Peel is home to 1.1 million people 
(2006), making it the second largest municipality in 
Ontario, and includes the cities of Mississauga and 
Brampton and the towns of Caledon and Bolton. The 
Region can be roughly described as predominately 
urban and characterized by comparatively new large- 
tract suburban developments, although the northern 
portion (Caledon) of the region is predominately rural 
agricultural. As a regional government body, Peel is 
responsible for the services and infrastructure related to 



water delivery and wastewater treatment, policing, plan- 
ning, and public health, amongst other services. 

Given its proximity to Toronto, employment opportu- 
nities, and accessibility (home to Pearson International 
airport and served by seven '400' multi-lane, limited 
access highways), the region's population has grown 
rapidly. Between 2001 and 2006, the region grew by 
17%, adding slightly more than 170,000 people to the 
population. Nearly 50% of Peel's population are born 
outside Canada (immigrants), with approximately 
120,000 arriving between 2001 and 2006 alone. Large 
immigrant or visible minority groups (based on 2006 
data) include South Asians (272,760), Filipino (42,900), 
Chinese (54,285), and Blacks (95,565). Other immigrant 
groups include South East Asian, West Asian, Latin 
American, Japanese, and Korean communities. Much of 
this new population is housed in new, low density sub- 
urban style development. Approximately 46% of the 
region's population report a non-English/non-French 
mother tongue. The median after tax income in Peel 
(2005) was greater than that of the overall province 
($Cdn62,181 versus $Cdn52,117, respectively), and has a 
generally well-educated population, with 34% of the 
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population aged 25 to 34 having a certificate, diploma, 
or degree [20]. 

Peel has a rich geography that is defined through mul- 
tiple existing neighbourhoods or service planning 
boundaries, including older communities that continue 
to retain their identity and electoral boundaries. Existing 
planning and service delivery areas include Peel's 'Family 
of Schools' areas (used by the Peel District School 
Board), Forward Sortation Areas (used by Canada Post 
and as a basis for the Social Planning Council of Peel's 
'Portraits of Peel'), Local Health Integration Networks, 
and Community Health Centre boundaries. Additionally, 
the Region is divided into a number of statistical zones, 
including Census Tracts (N = 205), which are small, 
relatively stable geographic areas that typically have a 
population of 2,500 to 8,000, and dissemination areas 
(400 -700 people), both of which are defined by Statis- 
tics Canada. 

The purpose of this work was how to express these 
varied geographies and summarize the diverse sociode- 
mographic and socioeconomic profiles of the Region. In 
the first instance, Census Tracts were used as the build- 
ing blocks for the Data Zones given stated preferences 
by Peel Public Health and ease of data availability at this 
scale. In the second instance, and following a review of 
the relevant literature [i.e., 8] and requests by Peel Pub- 
lic Health, a set of variables were initially considered for 
inclusion in the analysis that the research team and Peel 
staff felt expressed Peel's diversity. Variables requested 
for consideration by Peel included: % with no knowledge 
of English or French; % aged 25+ years who completed 
less than high school; % recent immigrants; and % low- 
income population, all of which are used in comparable 
studies. All variables were derived from the 2006 Census 
and based on the 20% Master data file from Statistics 
Canada. In addition, the research team suggested a 
number of other variables linked to population health 
outcomes, including % unemployed, % visible minority, 
% labour force aged 15+, and average number of per- 
sons in a household. Other variables initially considered 
in the analysis included alternate measures of income (i. 
e., median income and after tax income). 

From the initial list of 21 variables suspected by the 
Research Team to be likely indicators of health or 
socio-economic status, the number was reduced to 11 
through correlation analysis (at the census tract scale) 
using SAS 9.2. In cases where two variables were highly 
correlated with each other (indicating that they are 
likely measuring the same outcome), one variable was 
removed from further consideration. While a specific 
correlation value above which variables were excluded 
was not used in the current analysis, the preference was 
to retain variables favoured by Peel and/or those sup- 
ported by literature and that are linked to health 



outcomes. Variables retained in the analysis are defined 
in Table 1, and are consistent with those typically found 
in the literature and used for similar purposes. As such, 
we are explicitly acknowledging that no one, single vari- 
able could effectively summarize all neighbourhoods, 
with selected variables reflecting the literature on the 
determinants of health and relationships between envir- 
onment and health outcomes [21]. For instance, poor 
housing conditions are commonly associated with poor 
health outcomes, and may alter the factors underlying 
health status both directly and indirectly, such as 
through the presence or absence of social support 
mechanisms [22-24]. Similarly, the amount spent on 
housing has been linked to health outcomes, with 
families that are forced to spend significantly more on 
housing potentially sacrificing other health-related 
needs, including food and health care [25]. Immigrants, 
and particularly visible minorities, have also been shown 
to have poorer health than the broader Canadian popu- 
lation, reflective of various barriers to care including 
language difficulties [26,27], knowledge of health care 
services, and socio-cultural roles [28]. Recent immigrant 
groups are also less likely to access health services in 
Canada than Canadian-born citizens [29,30], and are 
less aware of preventative health services [31]. Unem- 
ployment, low socioeconomic status, and low educa- 
tional attainment are also commonly associated with 
poor health outcomes related to stress, inadequate 
knowledge of health care options and healthy lifestyles, 
and lack of income for health related activities [32-36]. 

Methods 

In defining Data Zones within the Region, Peel Public 
Health requested that the following issues be consid- 
ered: 

Table 1 Variables included in Principal Components 



Analysis 


Factor 


Definition 


Housing 


% Renters 




% Owner households spending 30% or more of 




household income on major payments 




% Households in need of major repairs 


Socioeconomic 


% Aged 20+ with no High School 




% Unemployed (Prior to May 16 th , 2006) 




% Low Income (Before tax, 2005) 


Sociodemographic 


% No Knowledge of English or French 




% Separated or Divorced 




% Widowed 




% Recent immigrants (Immigrated to Canada 




between 2001 and Census Day, May 16, 2006. 




(Census, 2006) 




% Lone Female Parent Family 



Note: All variables derived from the 2006 Census. 
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♦ Approximately 12-14 Data Zones were to be 
defined, with populations of approximately 80,000 to 
100,000. The exception to this request was in the 
northern portion of the Region of Peel (the commu- 
nity of Caledon), which is predominantly rural and 
therefore has a smaller population density. Peel 
reserved the right to redefine the population thresh- 
old for zones following the initial analysis; 

♦ Data zones were to be contiguous, follow Census 
Tract boundaries, and avoid cases where zones were 
encircled by other zones; 

♦ Data zones were to follow boundaries that corre- 
spond to areas of interest for other purposes; 

♦ Data zones were to focus more on the composition 
of the local population when defining neighbour- 
hood boundaries, rather than neighbourhood context 
[i.e., 13, 31, 32]; 

♦ Where plausible, it was requested that the Data 
Zones respect natural and human-made boundaries, 
such as rivers and highways. Several such barriers 
exist within the Region of Peel, including rail lines, 
the Credit River which extends northwest through 
Mississauga, and limited-access highways including 
highways 401, 403, 407, 410, and the Queen Eliza- 
beth Way (QEW) which dissect the region. In most 
cases, census tracts already follow these boundaries. 
In cases where they do not, census tracts must still 
form the boundary. 

In practice, it was not possible to satisfy all these cri- 
teria, and compromise was necessary. Most commonly 
(and as noted below), population constraints were 
waived in consultation with Peel staff given future 
growth anticipated growth trends. 

Differences in perceptions and definitions imply that 
neighbourhoods mean different things to different peo- 
ple (see Luginaah et al. [6] for a review). Although there 
is disagreement in the literature concerning the best 
way to capture the concept of a neighbourhood, Census 
Tracts provide one option. At the same time, the use of 
Census Tracts have been frequently been criticized 
because their statistically defined areas impose bound- 
aries may not necessarily be related to other social pro- 
cesses or perceptions of what a neighbourhood includes, 
reducing the power of a neighbourhood as a meaningful 
concept [37-40]. On the other hand, other studies argue 
that Census Tracts are good proxies of neighbourhoods 
[1,3] as compared to socially constructed areas, which 
are often loosely defined and lack the ability to link to 
other statistical data. Indeed, the comparison of several 
neighbourhood units of analysis suggests that Census 
Tracts are good proxies for natural neighbourhood 
boundaries in studies of neighbourhood effects on 
health [3]. Moreover, defining neighbourhoods by using 



Census Tracts (or groups of Census Tracts) offer a 
number of advantages, including direct linkage to statis- 
tical measures provided by Statistics Canada. 

Following the initial selection of the variables, principal 
component analysis (PCA) with a varimax orthogonal 
rotation was used to summarize variables and build 
indices, a practice commonly used to consolidate informa- 
tion along main dimensions and that has been widely used 
in defining zones similar to the aims of this work [i.e., 8, 
41, 42, 43]. While other zoning exercises have constructed 
an index based directly on the weighted variables, indices 
constructed in this way may be misleading by missing 
inter-relationships between variables, and/or fail to 
account for a more complete set of potential indicators. 
The central idea of PCA is to reduce the dimensionality of 
a data set which consists of a large number of interrelated 
variables, while retaining as much as possible the varia- 
tions present in the data set [42] , allowing the determina- 
tion of which tracts could be combined to form relatively 
homogenous areas. PCA allows for the extraction of com- 
ponents that reflect the pattern of the inter-correlations of 
the variables, while searching for commonalities. Only fac- 
tors that contributed greater than 10% of the variation 
would be retained for further analysis. 

While PCA assists with the identification of the 
sources of variation, it does not help in understanding 
the spatial patterning of the components. Following 
PCA, therefore, the next step was to create the bound- 
aries for the zones based on the PCA scores assigned to 
each Census Tract for each factor created by PCA. For 
this purpose, a Getis-Ord Gi* hot-spot analysis [18] was 
run on the resulting sets of Factor Scores. The statistic 
works by looking at each tract within the context of 
neighbouring tracts: if a tract's value is high (low), and 
the values for the neighbouring tracts are also high 
(low), it is a part of a so-called 'hot spot'. For each PCA 
factor, the Gi s statistic identifies the association between 
a Census Tract and its neighbours up to a specified dis- 
tance, or in terms of nearest neighbours where the CT 
shares a boundary. The Gi* statistic is well-suited to 
identify the existence of pockets or clusters of areas 
(tracts) with values higher in magnitude than might be 
expected by random chance and their statistical signifi- 
cance; to assess assumptions of stationarity (i.e., that 
spatial relationships are the same at all places in the 
study area); and to determine distances beyond which 
no discernible spatial association exists [18]. Impor- 
tantly, the Gi* statistic identifies clusters that can be 
used to statistically delineate zones. The output of the 
Gi* function is a z-score for each feature, with the z- 
score representing the statistical significance of cluster- 
ing for a specified distance, and the higher (or lower) 
the z-score, the stronger the association. A z-score near 
zero indicates no apparent concentration. 
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Results 

Two major factors emerged from the Principal Compo- 
nents Analysis (Table 2), explaining approximately 65% 
of the variance and which are often associated with 
poor health outcomes within the determinants of health 
literature. The first principal component, which 
explained 45% of the variance, is labelled as 'low socioe- 
conomic status' and includes the variables indicating a 
high recent immigrant population, no knowledge of 
either English or French, percent unemployed, no high 
school, and low income. The second component, which 
explained an additional 20% of the variance, was labelled 
as 'single renter'. The variables that loaded into this 
component included Separated/Divorced Single Parent 
Families with high rates of Rental Housing tenure, need 
for Major Repairs, and Low Income before Tax status. 
The percentage variance explained by a third compo- 
nent (9.5%) was marginally less than the threshold typi- 
cally used with PCA [42]. In addition, the variables that 
loaded highly into the third principal component 
(widowers and individuals without high school educa- 
tion), did not appear to be indicative of any particular 
social group, and were already included in component 1 
(education) or component 2 (widower status). As such, 
it was not considered further in the current analysis. 

Two methods of extraction for the resulting factors 
were tested. In the first method, following work done by 
Primpas et al. [43], a weighted index was created for the 
highest-scoring rotation correlations in each factor, 
rescaled so as to sum to a value of 1. These final rota- 
tion correlations are highlighted in Table 3. Factor 
scores were generated for each Census Tract by multi- 
plying each census variable by its respective rescaled 
rotation correlation. The resulting summed value of 
these variables represents a score out of a possible 100 
for which that particular Census Tract scores. The sec- 
ond method of extraction was equally as accurate (i.e., 
produced the same results) in terms of the final results. 

Table 2 Final Component Eigenvalues 



Eigenvalues (CORR) 



Component 


Eigenvalue 


Difference 


Proportion 


Cumulative 


1 


4.972620 


2.767540 


0.4521 


0.4521 


2 


2.205080 


1.164828 


0.2005 


0.6525 


3 


1.040252 


0.233820 


0.0946 


0.7471 


-1 


0.806432 


0.145497 


0.0733 


0.8204 


5 


0.660934 


0.201995 


0.0601 


0.8805 


6 


0.458939 


0.176109 


0.0417 


0.9222 


7 


0.282830 


0.069393 


0.0257 


0.9479 


8 


0.213437 


0.062486 


0.0194 


0.9673 


9 


0.150951 


0.020085 


0.0137 


0.9810 


10 


0.130866 


0.053206 


0.0119 


0.9929 


I I 


0.077660 




0.0071 


1 .0000 



Table 3 Varimax Rotated Variable Rotation Correlations 



Rotation Correlations (Structure) 



Variable 


DTI A 

n 1 l_4 


DT*> A 


% Separated or Divorced 


0.070103 


0.910449 


% Widowed 


0.082087 


0.487650 


% Renters 


0.478132 


0.766992 


% Households in need of major repairs 


0.015941 


0.752548 


% No English or French 


0.843414 


-0.327763 


% Unemployed 


0.743667 


0.222037 


% No High School 


0.467324 


0.207493 


% Low Income (Before tax, 2005) 


0.814993 


0.448588 


% owners spending 30% or more 


0.836553 


0.204874 


% Recent Immigrant 


0.878143 


0.131458 


% Single Mothers 


0.291777 


0.743309 



Upon performing a rotation, SAS outputs a value for 
each record representing how strongly each Census 
Tract scores in each of the rotated principal compo- 
nents. While both approaches yield similar visual results, 
with overlays of the two component values defining 
similar areas, the second method was utilized. 

The two PCA factors were used as input to the Gi* 
analysis (Figures 2 and 3). A Manhattan Distance mea- 
surement was applied to reflect the predominately urba- 
nized nature of region, and a Fixed Distance Band using 
the default distance of approximately 11 kilometres was 
ultimately applied. Only areas with a p value smaller 
than 0.05 (95% confidence) for either components were 
retained. In Figure 2, statistically significant spatial clus- 
tering of high Low Socioeconomic Status index values 
were found along the eastern extent of Peel, while the 
north and south contain statistically significant spatial 
clustering of low Low Socioeconomic Status index 
values. In Figure 3, statistically significant spatial cluster- 
ing of Single Renter index values were found along the 
southeast waterfront and in the southwest and north- 
west areas of Peel. By analyzing patterns of discreteness 
and overlap between the two indices, Data Zones could 
be delineated based on these clusters. 

Once the Gi* was computed and mapped for both 
PCA components, Data Zones could be delineated. As a 
first step, groups of Census Tracts that were statistically 
significant for either of the mapped components (low 
socioeconomic status and single renter) became the 
building blocks for a Data Zone. That is, there is statisti- 
cal robustness based on the Gi* statistic for grouping 
these zones based on their similarity. 

While the Getis-Ord Gi* analysis allows the determi- 
nation of 'hot-spots', portions of the region remained 
un-identified, including much of central Peel. That is, 
the Gi* analysis did not find statistically significant clus- 
ters for either factor in much of the central portion of 
the region. To derive Data Zones in these areas, a 
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Figure 2 Low Socioeconomic Status Index (Gi* p < 0.05 clusters). 



decision tree was created to allow for a reliable, repeata- 
ble delineation process that avoids personal subjectivity. 
The resulting decision tree (Figure 4) explains how the 
process of delineation followed a set of decisions based 
upon clustered factor scores, population counts, and 
community inclusiveness. 

The general logic of the decision tree was based on 
two general streams. In cases where the Gi* analysis 
identified hot-spots, these clusters were compared with 
known hard boundaries such as roads or other features, 
and checked to ensure that they met other criteria such 
as population size. For the remaining portions of the 
Region that needed to be defined (i.e., those areas that 
were not defined as clusters by the Gi* statistic), we first 
turned to the DMTI Neighbourhood and Community 



Boundaries file [44], a "continually updated" set of 
neighbourhood boundaries as determined by "amalga- 
mating and integrating information from municipal data 
sources" [45]. These DMTI-based boundaries were over- 
laid with the initial zones, allowing Data Zone bound- 
aries to be initially constructed based on known neigh- 
bourhoods, while referencing population counts for each 
potential zone and ensuring that the constructed zones 
remained contiguous. 

Once the initial set of contiguous zones was gener- 
ated, a physical approach was used to refine zonal 
boundaries through two 'ground-truthing' methods. 
First, we referenced known boundaries, including physi- 
cal features such as highways and streams, to determine 
if more 'natural' boundaries separating zones might be 
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Figure 3 Single Renter Status Index (Gi* p < 0.05 clusters) 



warranted, echoing Pickett and Pearl's [46] call for 
meaningful neighbourhoods that are based on natural 
boundaries. In this case, we assumed that such barriers 
differentiate areas through the physical division of space, 
such as separating neighbourhoods so that there is 
reduced interaction, or by separating places with differ- 
ent socioeconomic and sociodemographic profiles. Con- 
sequently, natural boundaries serve both functional 
purposes such as transport or recreation, as well as 
creating barriers between different groups [e.g., 47, 48]. 
Roads and highways were obtained from DMTI's Route 
Logistics file (2008), which contains highways and roads 
for all of Canada, albeit clipped to the boundaries of the 
Region of Peel [49] . Visible land features were obtained 
from the Satellite Streetview Orthophoto dataset created 



by the 60cm resolution Quickbird Satellite and released 
by DMTI Spatial [50]. The result is a spatial file with 
the different overlays (zones, neighbourhoods, roads, 
physical landforms), along with the zones delineated by 
the Grstatistic, neighbourhood boundaries, and other 
"hard boundaries" (i.e., transportation) in Peel. Compari- 
son of these boundaries identified any anomalies 
through consideration of both land features and physical 
boundaries. Throughout, total population counts for 
each potential zone were verified. The resulting 'shape' 
of the derived Data Zone was not an issue in the analy- 
sis owing to the imposition of the various constraints - 
statistical significance from the Gi* statistic, number of 
derived zones, known boundaries such as roads or phy- 
sical features, and population counts - meant that any 
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attempt to constrain the shape of the Data Zones was 
less meaningful. A total of 13 zones were identified at 
this stage of the analysis. 

Second, we presented the results to Peel Public Health 
for their expert input on the defined boundaries. Peel 
staff, including GIS technicians, planners, and public 
health officials participated in two round-table discus- 
sions where interim results were presented. Through 
their more detailed knowledge of current and future 
population trends, socioeconomic profiles, and develop- 
ment within the region, participants critically analyzed 
the methodology and outcomes, and commented on 
potential anomalies or disagreements with the resulting 
divisions. These exercises resulted in the division of 
Caledon to create Data Zone 13 (West Brampton) and 
15 (Bolton) at the request of Peel staff. In the first 
instance (zone 13), Peel's Official Plan notes the short- 
term housing and commercial development of the West 
Brampton area, with rapid population growth expected 
within a five-year window. Although the area was still 
largely rural (as of 2010) and therefore more similar to 



Caledon, the imminent population growth and develop- 
ment meant that Peel staff felt it was more suitable to 
present it as a separate zone rather than amalgamate 
with Caledon as suggested by the statistical analysis, 
enabling future flexibility with the zones. In the second 
case, the community of Bolton (Data Zone 15) was 
separated from the northeast portion of Brampton, 
again reflecting the uniqueness of the Bolton area (rela- 
tive to the rural areas immediately around Bolton), and 
the potential for substantial short-term population 
growth, even though its 2006 population count (22,719) 
also falls below the threshold originally suggested for 
the definition of the zones. While counter to the initial 
constraints (namely that population thresholds for the 
two new Data Zones were less than the minimum size 
initially requested by Peel, meaning the population of 
the zones was not equitably distributed across each 
zone) and the clustering results, Peel staff felt that these 
modifications better provided for the future growth of 
Peel's population and more consistent zones over the 
longer-term. In addition to consultation with Peel staff, 
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Peel also used the final derived Data Zones to produce 
maps of various health outcomes as an internal check of 
their validity. 

The result of the analysis, 'ground-truthing' and expert 
input exercises resulted in the set of Data Zones shown 
in Figure 5. Populations ranged from 22,719 (Bolton) to 
106,064 (East Brampton) with boundaries that respect 
the various natural and physical delimiters in the region. 

Conclusion 

Through a series of mixed methods, a set of Data Zones 
were delineated for the Region of Peel, Ontario, based 
on existing Census Tracts. It is hoped that as further 
census and health outcome data becomes available, and 
given that Peel's population continues to grow and 



become more diverse, the delineated zones can be veri- 
fied and refined for future analyses. 

The approach used in this paper is flexible and bol- 
stered by a series of checks and balances throughout the 
process, including the use of statistically defined clusters 
of like Census Tracts through the use of the Gi* statis- 
tic, giving statistical validity to the defined zones. In 
addition, the use of a formal 'decision tree' to assist in 
the determination of zones, along with the recognition 
of local community boundaries, physical land features 
such as major roads or landscapes, and the knowledge 
of local health experts, resulted in a robust set of Data 
Zones for use by Public Health in the Region of Peel. 
Consequently, the methodology to define zones illu- 
strated in this paper draws upon a number of inputs, 
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with the end result a more robust and meaningful set of 
zones. 

The methodology has a number of advantages, 
enabling it to be applied elsewhere and in different con- 
texts. First, the method can be adjusted based on the 
desired output such as size/number of zones, their con- 
stituent building blocks, or even the inclusion (exclu- 
sion) of the initial statistical steps such as PCA. For 
example, the PCA analysis could be removed as a step. 
Instead, input for the Gi* analysis could, for example, be 
based on other existing inputs (i.e., an individual vari- 
able such as low income status) or indices, such as the 
UK's Townsend Index of Deprivation [9,51]. Following 
the Gi" analysis, which would identify clusters of like 
areas based on these alternate inputs, statistically similar 
zones could once again be identified through the use of 
the decision tree, consultation, and expert opinion. Sec- 
ond, the approach is applicable to both research and 
practical applications such as health surveillance. Third, 
the approach can be scaled up or down to other geogra- 
phical contexts. Fourth, the consultative process and use 
of ancillary data removes concerns that the zones are 
only representative of the statistical process and the 
building blocks (Census Tracts) that underlie the zones. 
In essence, the proposed methodology increased partici- 
pation in the analysis, and ultimately improved the defi- 
nition of the resulting Data Zones to reflect local 
knowledge. 

At the same time, the practice of using aggregated 
spatial data as the basis for creating larger areal units is 
a technique associated with potential errors, biases, and 
oversights - regardless of the context or application. 
First, the creation of socially-based spatial aggregations 
can be used to misrepresent those living within an area, 
either intentionally as in gerrymandering political dis- 
tricts to subdivide sizable voting populations, or unin- 
tentionally through irresponsible analysis. Caution must 
therefore be exercised in the use of expert opinion. Con- 
sequently, the decision tree is an important component 
of the work, providing a platform from which to evalu- 
ate changes to the set of zones. 

Second, although Census Tracts were requested by 
Peel Public Health to be the building blocks for the ana- 
lysis, there are reliability issues with using such a large 
spatial area as the building block for even larger Data 
Zones. It is recognized within spatial science that as an 
aggregated area increases in size, the recognized var- 
iance of the characteristics of the population within the 
area declines [52]. By generalizing the characteristics of 
a population with some kind of areal unit, potentially 
important variances within the defined zones are hid- 
den. By using Census Tracts as opposed to smaller dis- 
semination areas (for which the same census 
information is available), important variations in the 



population composition of the Region may be over- 
looked. The modifiable areal unit problem typifies this 
[52,53], reminding researchers that "the areal units 
(zonal objects) used in many geographical studies are 
arbitrary, modifiable, and subject to the whims and fan- 
cies of whoever is doing, or did, the aggregating" [53, p. 
102]. Because of this, whenever attempting to subdivide 
an area based on the assumed similarities of those living 
there, care must be taken to ensure that the generalized 
areas most accurately represent the people living within 
their borders, maximizing the differences between units, 
while minimizing the differences within them [54]. Simi- 
larly, the use of a fixed distance band with the Gi* sta- 
tistic, while useful in the urban portions of Peel, may be 
somewhat less relevant in the rural (northern) portion, 
again potentially altering the definition of the Data 
Zones. In other words, it is important to realize that the 
processes that create population clusters are unlikely to 
operate at only one geographic scale, but are instead 
shaped by complex interactions. Consequently, further 
work may look at the strengths and weaknesses of the 
proposed methodology. 
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